05:00
This slide deck was built in Quarto!
You can click on a table’s name to open it to the right. Take a look at the first three tabs available, which will be available for every table.
Look through the tables in the Broadstreet ADI data and the CDC Births data to understand what the data is capturing, and identify the following:
05:00
This is the pattern we’ll come back to again and again in SQL (Structured Query Language):
SELECT + FROM + WHERE;
| cabin | high_score | wins | losses |
|---|---|---|---|
| Yellow | 10 | 3 | 0 |
| Blue | 7 | 2 | 1 |
| Orange | 9 | 1 | 2 |
For example: SELECT cabin, high_score FROM camp_kickball_tournament WHERE high_score < 10
Aliases allow you to (re)name things:
SELECT
cabin,
high_score,
wins/(wins + losses) AS prop_win
FROM camp_kickball_tournament
WHERE high_score < 10;
Which is easier to read?
SELECT
cabin,
high_score,
wins/(wins + losses) AS prop_win
FROM camp_kickball_tournament
WHERE high_score < 10;
select cabin,high_score,wins/(wins + losses) as prop_win from camp_kickball_tournament where high_score < 10;
Use the “Query” button (or the one with the magnifying glass) when you’re looking within a table to avoid annoying syntax issues like having to type fully qualified table names (project.dataset.table).
Once you click “Query Table” (and then “New Tab” or whatever you prefer), you’ll see a partial query, with the cursor located so that you can add fields after SELECT.
You’ll also see warnings that your query is invalid. That’s expected!
sdoh_cdc_wonder_natality) called county_natality.03:00
Let’s use the AI assistant to learn some SQL!
I asked you before to find three things, and now we’ll use them in a JOIN in SQL.
What to join?
What constitutes a join?
Which type of join / data completion
INNER JOIN (or just JOIN)
LEFT (or LEFT OUTER) JOIN
RIGHT (or RIGHT OUTER) JOIN
FULL (or FULL OUTER) JOIN
The basic syntax of a join is:
SELECT [fields we want]
FROM [left table name] [some join type] [right table name]
ON (or USING) [join criteria]
We have multiples of counties, with varying ADI scores, in our query results. That’s not great – we want each county to appear only once.
What’s going on? Dates.
Gemini gets us the start of a query, but it won’t quite work. But it does give you a nice look at aliasing!
This is what Gemini gave me:
SELECT
t1.Ave_Birth_Weight_gms,
t2.area_deprivation_index_percent
FROM
`bigquery-public-data.sdoh_cdc_wonder_natality.county_natality` AS t1
INNER JOIN `bigquery-public-data.broadstreet_adi.area_deprivation_index_by_county` AS t2 ON t1.County_of_Residence_FIPS = t2.county_fips_code
AND t1.Year = t2.year
Error!
No matching signature for operator = for argument types: DATE, INT64. Supported signature: ANY = ANY at [7:10]
DATE is a type that has year, day, and month. But if I just have a four digit year, that’s an integer (INT64).
I can’t compare an INT column that has “2019” to a DATE column that has “2019-01-01”
Include what you’re trying to do and “in BigQuery.”
In our case, EXTRACT seems likely to work.
SELECT
cdc.Ave_Birth_Weight_gms,
adi.area_deprivation_index_percent
FROM
`bigquery-public-data.sdoh_cdc_wonder_natality.county_natality` AS cdc
INNER JOIN `bigquery-public-data.broadstreet_adi.area_deprivation_index_by_county` AS adi
ON cdc.County_of_Residence_FIPS = adi.county_fips_code AND
EXTRACT(YEAR FROM cdc.Year) = adi.year
SELECT
cdc.Ave_Birth_Weight_gms,
adi.area_deprivation_index_percent,
cdc.County_of_Residence,
adi.county_name,
cdc.Year,
adi.year
FROM
`bigquery-public-data.sdoh_cdc_wonder_natality.county_natality` AS cdc
INNER JOIN `bigquery-public-data.broadstreet_adi.area_deprivation_index_by_county` AS adi
ON cdc.County_of_Residence_FIPS = adi.county_fips_code AND
EXTRACT(YEAR FROM cdc.Year) = adi.year
One last thing – we now are matching on year, but what if we have data on 2018, 2019, and 2020 for a given county, in both tables? We’ll match that data up properly, but some counties might be over-represented in our analysis.
In your Query results, you can save your work. In our case, we want to save it as a BigQuery table, in our project.
Let’s take a break and then come back to work in R!
10:00
Joy Payton, Children’s Hospital of Philadelphia